Skip to content

[QEff Finetune]: Enable PP+DDP #394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 18, 2025
Merged

[QEff Finetune]: Enable PP+DDP #394

merged 17 commits into from
Jul 18, 2025

Conversation

quic-mamta
Copy link
Contributor

@quic-mamta quic-mamta commented May 8, 2025

Added support for PP+DDP

Command for PP only : QAIC_VISIBLE_DEVICES=0,1,2,3 python -m QEfficient.cloud.finetune --device qaic --enable_pp --num_pp_stages 4 (number of pipeline stages must be less than or equal to total available devices)

Command for DDP only : QAIC_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node 4 -m QEfficient.cloud.finetune --device qaic --enable_ddp

Command for PP+DDP : For 4 qaic devices(1 Ultra) with 2 pipeline stages
QAIC_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc-per-node 2 -m QEfficient.cloud.finetune --device qaic --enable_ddp --enable_pp --num_pp_stages 2

Signed-off-by: Mamta Singh <[email protected]>
@quic-mamta quic-mamta marked this pull request as draft May 8, 2025 07:55
@quic-mamta quic-mamta self-assigned this May 8, 2025
@quic-mamta quic-mamta changed the title Enable PP+DDP [QEff Finetune]: Enable PP+DDP May 8, 2025
@quic-mamta quic-mamta requested review from vbaddi and quic-swatia May 8, 2025 07:58
@quic-mamta quic-mamta force-pushed the pp_ddp branch 2 times, most recently from e8b1da7 to df36ae1 Compare May 8, 2025 08:34
@quic-mamta quic-mamta force-pushed the pp_ddp branch 8 times, most recently from 3ca1229 to 53ff3c4 Compare May 11, 2025 19:37
Copy link
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, Mamta! Please address the comments. Let us discuss offline if anything is confusing.

Signed-off-by: Mamta Singh <[email protected]>
Signed-off-by: Mamta Singh <[email protected]>
@quic-mamta quic-mamta force-pushed the pp_ddp branch 3 times, most recently from 92a4ec1 to a67091a Compare July 14, 2025 06:58
Copy link
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the core logic of layer splitting and make it simpler. Refine the documentation and make it look better.

Copy link
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall good work. Please address the comments. :)

Signed-off-by: Mamta Singh <[email protected]>
Signed-off-by: Mamta Singh <[email protected]>
Copy link
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments are there but those are minor comments. See if they can be addressed in this PR. If not, then also it is fine, can be taken as later on. Overall looks good. Thanks for good amount of code cleanup and multiple experiments to validate the PP+DDP working, Mamta. :)

Signed-off-by: Mamta Singh <[email protected]>
@quic-mamta quic-mamta marked this pull request as ready for review July 18, 2025 13:11
@quic-mamta quic-mamta merged commit 5b7a315 into quic:main Jul 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants